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ABSTRACT 



Utilization of neural network techniques to recognize and classify acoustic signals has long been 
pursued and shows great promise as a robust application of neural network technology. Traditional 
techniques have proven effective but in some cases are quite computationally intensive, as the 
sampling rates necessary to capture the transient result in large input vectors and thus large neural 
networks. This thesis presents an alternative transient classification scheme which considerably 
reduces neural network size and thus computation time. Parameterization of the acoustic transient to 
a set of distinct characteristics (e.g. frequency, pqwer spectral density) which capture the structure 
of the input signal is the key to this new approach. Testing methods and results are presented on 
networks for which computation time is a fraction of that necessary with traditional methods, yet 
classification reliability is maintained. Neural network acoustic classification systems utilizing the 
above techniques are compared to classic time domain classification networks. Last, a case studs L 
presented which looks at these techniques applied to the acoustic intercept problem. 
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I. INTRODUCTION 



A. TRADITIONAL PROCESSING 

The purpose of this thesis is to present a new method for 
classifying extremely short duration unintentional acoustic 
transients, utilizing neural network computing methods. This 
thesis presents an acoustic transient classification scheme 
which serves to take advantage of the inherent feature 
extraction capability of neural networks. 

An acoustic transient is a transient wave which results 
from the sudden release of energy associated with any of a 
large number of events in the ocean environment. Examples 
include the snapping of the tail of a shrimp against its body 
as it seeks to propel itself, the rattle of two links of chain 
tethering a navigation buoy, and the stress incurred or 
released as the metal hull of a submarine is compressed or 
expanded during changes in depth. These types of transients 
are detectable with underwater pressure sensitive hydrophones 
but are often very difficult if not impossible to classify, 
owing to extremely short signal duration. 

Traditional acoustic transient signal analysis has relied 
on classic techniques of Fourier analysis. See Figure 1. These 
generally include sensing the analog signal, sampling the 
signal at some rate (typically just above the Nyquist rate) , 
feeding the now discrete signal to a Fast Fourier Transform 
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(henceforth referred to as FFT) machine, analyzing the signal 
for frequency content, and finally comparing the signal 
against the characteristics of signals known to contain 
similar frequency content. 
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Figure 1: Traditional Signal Classification 



These techniques have proven to be feasible, although 
somewhat computationally intensive, for continuous analog and 
moderate duration transient acoustic signals. 

B. NEURAL PROCESSING 

In recent years neural networks have offered an 
alternative approach to pattern recognition and signal 
processing based on automated learning procedures. Neural 
networks are attractive as a means of classifying acoustic 
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transients because they are capable of discovering features 
and patterns of interrelated features which serve to define 
the corresponding class of a signal. Additionally this method 
of pattern classification is desirable because a neural 
network has an ability to learn this structure and thus is 
capable of generalizing to novel or new but similar patterns. 
This being said, most neural network researchers in this area 
have attempted to utilize time series data or its Fourier 
transformed frequency counterpart directly as input to the 
network classifier. This approach is certainly advantageous 
when viewed in light of the arguments previously suggested and 
when compared to the computation time and reliability of the 
systems utilizing methods displayed in Figure 1. However this 
method is not without difficulties of its own. Foremost among 
problems associated with this type of approach is the need to 
"find" and extract the transient within a much larger data 
field and then to properly center the data prior to 
presentation to the network. Others have studied this problem 
and a good discussion of workable extraction methods is 
contained in a master's thesis by Shipley [Ref. 1]. 
Additionally given that the extraction has been made 
successfully the resulting input data vector can itself be 
quite large, which of course leads to a larger neural network 
and thus longer computation time. As an example suppose that 
a 10 msec duration transient containing frequencies in the 
range 3-10 kHz is to be detected. By the Nyquist sampling 



3 



theorem : 



£ 
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2 •£„ 



(1) 



Where 

f s = The sampling frequency 

f max = The maximum frequency contained within the 
signal 

The sampling frequency for this case is 20 kHz. Sampled 
over 10 msec this results in 200 data points, necessitating a 
neural network input layer of 200 units and perhaps a total 
network size of 300 units. Although not computationally 
unreasonable by today's computing standards this thesis 
proposes to show that this same signal can be reliably 
classified with a neural network utilizing less than 40 units. 
Additionally the methods presented here do not suffer from 
many of the limitations outlined above. Namely there is no 
need to center data and remarkably network size is independent 
of signal duration. Figure 2 represents a conceptual block 
overview of the classification process described herein. This 
method stands in sharp contrast to that realized by classical 
methods such as those outlined in Figure 1. Note for example 
that although signal pre-processing is required, the human 
interface is gone, having occurred prior to signal pre- 
processing, in a less demanding environment. 
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Figure 2: Neural Network Signal Classification 



C. OBJECTIVES 

This thesis produces a neural network transient acoustic 
signal classifier using commercially available software and 
hardware. This thesis utilizes data which has undergone signal 
pre-processing to parameterize the data into 31 individual 
features as input to the feature based neural classifier. 

Further, this thesis compares the performance of this 
feature based classifier with time and frequency domain neural 
classifiers. Based on this comparison a feature based network 
which is considerably reduced in size is built, tested and 
analyzed . 

Finally a case study is presented which demonstrates one 
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possible application of the neural computing analysis which is 
done in the balance of the thesis. In this case study the 
neural computing concepts and ideas presented herein are 
applied to the active acoustic intercept problem. 

Elementary discussions of acoustic and neuralcomput ing 
fundamentals as they relate to pattern recognition immediately 
follow this introduction. These should serve the uninitiated 
reader with enough neural network knowledge to comfortably 
read the remainder of the thesis. The remainder of the thesis 
is devoted to describing how the software tools were used to 
analyze the signals, how the data were analyzed using the 
neural network to prune down the size of the original feature 
based network, and side by side analysis of the new and 
traditional neural network transient detection methods 
emphasizing the results of how the smaller more efficient 
network performed. 
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II. ACOUSTIC AND NEURAL NETWORK FUNDAMENTALS 



A. ACOUSTIC FUNDAMENTALS 

This thesis deals primarily with signal processing of 
passive acoustic transient data. Although standard signal 
processing techniques exist for acoustic data, surprisingly 
little has been written on passive acoustic transient data. 
Thus some of the analysis overview presented here is borrowed 
from active sonar signal processing which by its very nature 
deals with the question of transient processing, namely the 
acoustic transient associated with the return of an active 
sonar emission from an acoustically reflective object. 

When considering the processing of acoustic information in 
the ocean it is necessary to first consider the nature of 
sound in the ocean. The data analyzed in this thesis is 
transient noise produced from a moving source which is a fixed 
distance from a receiver which, in turn, listens through a 
background of noise. It is then relevant to look at the many 
difficulties associated with the detection of this signal. 

The nature of the general passive acoustic problem is well 
documented [Ref. 2]. A classical argument is one in which a 
source and source level are defined. The many ways in which 
energy from the source is lost as the sound propagates through 
the ocean is then characterized. Finally the difficulties 
associated with detection of a signal in the presence of 



7 



background noise is quantified. Urick provides an excellent 
overview for the interested reader [Ref. 2]. 

Presented here is a specific discussion relevant to 
gathering and processing acoustic information in the ocean 
environment and a brief development of the nature of 
transients which allows direct substitution in the normal 
intensity based form of the passive sonar equations. 

The data utilized in this thesis were gathered by a 
passive acoustic pressure based receiver listening in the 
noise laden ocean environment. The hydrophone, in its simplest 
form, is an electroacoustic transducer which measures the 
ambient pressure field directly through surface displacement 
and converts the field fluctuations to a voltage series in 
time through the piezoelectric effect. The user is provided 
then with a voltage series which represents the pressure field 
as a function of time at the receiver. Of course the 
hydrophone is calibrated before being placed in the water and 
thus the voltage series can readily be returned to a pressure 
field through: 



K ^0 x^T 



(2) 



Where 

V x = Voltage recorded by the hydrophone 
The sensitivity of the hydrophone 
P T = The pressure field 
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This conversion is convenient for a number of reasons. 



First the pressure field can be processed to produce useful 
parametric measurements such as signal power, signal mass 
density, signal amplitude, etc. Most importantly, the signal 
can now be related to a Sound Pressure Level (SPL): 

SPL= 20log-^- (3) 

P zef 



Where 

P c =Effective Pressure = P T /(2) ,/! 

Last the voltage or pressure time series can be 
transformed to the frequency domain through standard FFT 
techniques and a whole new series of parametric information 
can be extracted, such as power spectral density, spectral 
moments, etc. 

Now a short development of the acoustic nature of 
transients is presented as well as how these transients are 
transformed to relate them to the intensity based form of the 
passive sonar equations. 

Typically the sonar equations are formulated in terms of 
intensity in the radiated sound field. A more general approach 
specific to the characterization of a transient is to write 
the equations in terms of energy flux density, defined as the 
acoustic energy per unit area of the transient wavefront, 
which is the time integral of the instantaneous intensity. 
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E= f Idt= — [ p 2 dt 



(4) 



ac 



0 



0 



Where : 

I = Intensity 
c = Sound Speed 
p = Acoustic pressure 
a = Density 

In this case then the Intensity of the transient can be 
thought of as the mean square pressure of the wave divided by 
the specific acoustic impedance and averaged over an integral 
of time T: 



The quantity T is often hard to define for short duration 
signals. However it can be shown that the intensity form of 
the sonar equations can be used, provided that the source 
level is defined as: 



T 




(5) 



o 



SL = 10-log (E) -10-log (t e ) 



(6) 



Where 



SL= Source Level of the transformed transient 



t, = the duration of the transient 
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This is convenient because it allows processing of short 
duration transients utilizing traditional methods of sonar 
signal processing. This type of processing will prove 
convenient for time series analysis. [Ref. 2] 

As stated in the introduction this thesis is about 
recognition of acoustic information. Accordingly it is 
necessary to provide the reader with some basic fundamentals 
in what neural networks are and do. It is hoped that this 
overview will provide the uninitiated reader with sufficient 
knowledge to extract that which he finds relevant to his own 
particular interests and endeavors. 



B. NEURAL NETWORK FUNDAMENTALS 

This section serves to provide the reader with an 
introduction to neural network computing fundamentals which 
stands alone and will facilitate the discussions in the follow 
on sections. 

In a strict formal sense a neural network is: 

"A parallel, distributed information processing 
structure consisting of processing elements (which can 
possess a local memory and can carry out localized 
information processing operations) interconnected via 
unidirectional signal channels called connections. Each 
processing element has a single output connection that 
branches ("fans out") into as many collateral connections 
as desired; each carries the same signal- the processing 
element output signal. The processing element output 
signal can be of any mathematical type desired. The 
information processing that goes on within each processing 
element can be defined arbitrarily with the restriction 
that it must be completely local; that is it must depend 
only on the current values of the input signals arriving 
at the processing element via impinging connections and on 
values stored in the processing element's local 
memory ." [Ref . 3] 
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In a more practical sense a neural network consists of a 
computer architecture which incorporates all of the following: 

1) A connection geometry for individual processing 
elements (henceforth referred to as neurons) 

2) A transfer function which tells the network how to map 
or pass data from one neuron to others. 

3) A learning rule which allows the network to improve 
its ability (learn by reducing error) to properly map the 
input to the output after repeated presentations of both. 

4) An algorithm for minimizing output error. 

1. CONNECTION GEOMETRIES 

Connection geometries are simply the manner in which 
individual neurons are connected to facilitate the transfer of 
data. Figure 3 provides an example of one such geometry. The 
commonest type of artificial neural network consists of three 
layers of neurons. A layer of input neurons is connected to a 
layer of ’’hidden" neurons which is connected to a layer of 
output neurons. Although there is more than one way to connect 
this architecture, the networks considered in this thesis are 
all fully interconnected, i.e. each neuron in each layer is 
fully connected to each neuron in each layer immediately above 
and below it. Thus Figure 3 consists of one input layer with 
6 neurons, one hidden layer with 3 neurons, and one output 
layer with 2 output neurons. All the neurons are fully 
interconnected as shown in the figure and discussed above. 
Also shown in Figure 3 is a bias unit. This bias unit acts 
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much like an electrical ground, maintaining a constant base 
level of activity when the activity of the neuron falls below 
a selectable threshold value. 




Figure 3: Typical Fully Connected Neural Network 

2 . TRANSFER FUNCTIONS 



One important feature of neurocomputing with neural 
networks is the manner in which data is passed and manipulated 
between neurons of one layer and neurons of another layer and 
within the neuron itself. This process of manipulating data 
within the neuron is accomplished mathematically by use of a 
transfer function. This function uses local memory and input 
to the neuron to produce the activation level for the neuron. 
Essentially the transfer function receives inputs as values 
stored in local memory corresponding to the current state of 
the neuron and it also receives input via the connections to 
the neuron. The transfer function then performs a mathematical 
operation on the inputs and produces two quantities, namely 
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the output activation level of the neuron, i.e. that signal 
which is passed on to other neurons via connections at the 
next update, and an activation level which is stored in local 
memory and corresponds to the new state of the neuron. 

Transfer functions can really be any of a variety of 
mathematical functions which provide proper operation of the 
network. Experience and experimentation has limited these 
practically in most cases to the sigmoid function, the 
hyperbolic tangent function and other trigonometric functions, 
and straight linear mapping. In practice the most widely used 
transfer function is the sigmoid function because of an 
ability to map the real numbers (- 00 , 00 ) to the set (0,1). The 
work presented in this thesis was done with the sigmoid 
function as a mapping transfer function. The sigmoid function 
is defined as: 



f (x)= (7) 

1 + e~ ax 

This function has the properties that it is a bounded 
differentiable real function. It is bounded and monotonic 
increasing for all real inputs and has a positive derivative 
everywhere. Further, it is essentially linear for input values 
which are near the central point of the function (input values 
near zero) . These properties make it convenient for use in 
generalized delta rule learning which will be discussed in the 
next section. Figure 4 illustrates graphically these features 
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and demonstrates the concept of mapping a large range of 
inputs (-100,100) to a small range of outputs (0,1), one 
feature which makes it desirable as a transfer function. 




3. NEURAL NETWORK LEARNING 
a. Learning Rules 

As has been mentioned previously, "the purpose of 
the network is to take a set of inputs in the form of features 
represented as numbers in an input vector and map them to one 
in a category of probable output types, represented as the 
activation levels of the output neurons in an output vector. 
These output levels can take on any values in the set (0,1), 
with values near zero representing low activity levels and 
values near one corresponding to high activity levels for the 
associated neuron. For the network to do this it needs to have 
"learned" what the output categories are and what input vector 
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features are representative of a particular type of output 
vector. There are a number of clever and innovative ways of 
doing this [Ref. 3]. The method chosen for this work and that 
which will now be discussed is known as supervised learning 
utilizing the backpropagation algorithm which is based on the 
generalized delta rule. 

Simply put, the goal is to present the network 
with exemplars of each type of input vector that it is 
expected to learn and then "tell" it that these input vectors 
correspond to a given output vector. A neural network unlike 
the human brain is simply computer code, thus the way it is 
"told" information is by way of numerical valued vector input. 
Numbers which represent features common to an output category 
type are presented to the network at the input layer. These 
numbers are then mapped through the network to the output by 
way of the transfer function operating on neurons and 
connections to arrive at final values at the output neurons. 
This process is then repeated a number of times for different 
exemplars of the various output vector types. During this 
"training" process the desired vector output is also provided 
to the network. An error is then calculated for the process. 
This error, in its simplest form compares the difference 
between the "perfect" or "desired" output activity for the 
given input, and the actual output neuron activation level 
calculated by the network. This error is then backpropagated 
through the network and it adjusts itself to minimize this 
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error. The manner in which the error is backpropagated and the 
way in which the network "adjusts" itself form the basis of 
the learning occurring in the network. 

b . Generalized Delta Rule and Backpropagation 

The final concepts which need clarification are 
the manner in which the network learns the associations 
necessary to perform its feature based recognition. As 
previously mentioned this is done by backpropagat ing the 
output error to the input and repeating the training 
presentation. Learning occurs in the form of adjustments of 
the weights representing the mathematical strength of 
connections between neurons. Through repeated presentations of 
the training vectors these weights are slowly adjusted to 
facilitate reduction in the output error. This is accomplished 
practically through use of the generalized delta learning rule 
to adjust the weights and the backpropagation algorithm to 
communicate the information back through the network. 

(1) Generalized Delta Rule. The generalized 
delta learning rule states that the change in the weight of 
the connection between the i* and j* neurons is proportional 
to the difference between the error input to the 1 th neuron and 
the activation of the j* neuron or: 

Aw ij =e6 i a j - ( 8 ) 
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Where 



e = a learning rate parameter which determines how 

fast the network changes the weights 

^—(tj- a,) (f,) ' (net;) for an output neuron 

t~ The training input to the i* neuron 

aj= the activation of the j* input neuron 

f ' ^Derivative of the activation function with respect to 

a change in the net input to the neuron 

net—EajWy + bias, 

The bias term mentioned above is the same as 
was described in association with the description of the 
connection geometries of Figure 3. The 6 , given above is for 
an output neuron. For the non-output neuron <5, is given by: 

(9) 

j 

It can be shown that this rule will find a 
set of weights that drives the error arbitrarily close to zero 
for every set of patterns in the training set if such a set of 
weights exist. Such a set of weights will exist if, for each 
input pattern target pair, the target can be predicted from a 
linear combination of the activation of the inputs. [Ref 4] 
(2) Backpropagation. To complete the discussion 
of how this new information is communicated to the network a 
brief explanation of the backpropagation algorithm is 
presented. The basic idea of the backpropagation method is to 
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combine a nonlinear system capable of making decisions with 
the objective error function of Least Mean Squares and 
gradient descent. The objective error function for Least Mean 
Square error is: 



compute the derivative of the error function with respect to 
any weight in the network and then change the weight according 
to the rule: 



The "k" in Equation 11 above is just a 
proportionality constant. 

The application of the back propagation rule, 
then involves two phases: During the first phase the input is 
presented and propagated forward through the network to 
compute the output value for each neuron. This output is 
then compared with the target, resulting in a S term for each 
output neuron. The second phase involves a backboard pass 
through the network (analogous to the initial forward pass) 
during which the S term is computed for each neuron in the 
network. Once these two phases are complete, the weight error 
derivatives (Equation 11) can be used to compute the actual 



^ i 



( 10 ) 



To implement this idea one must be able to 




( 11 ) 
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weight changes on a pattern by pattern basis, or they may be 
accumulated over the entire ensemble of patterns. Additional 
details can be found in "Parallel Distributed Processing" from 
which the foregoing discussion was taken. [Ref. 4] 
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III. FEATURE BASED NEURAL NETWORK CLASSIFIER 



As discussed in the introduction, the goal of this thesis 
is to demonstrate a feature based acoustic transient 
classifier. This section describes the design and operational 
details for the feature based classifier. 

A number of design considerations and parameters play into 
the question of designing a neural network which can perform 
this type of classification task. These include: 

1) Characterization of input data sets. 

2) The type of network best suited to perform the 
classification task. 

3) The size of the network needed to perform the task. 

4) Decisions on training data and training time such that 
network performance is optimized. 

Each of these will now be discussed in some detail as they 
relate to the classification task at hand. 

A. INPUT DATA CHARACTERISTICS AND ANALYSIS 

Data used in this thesis consists of raw times series 
voltage data for three different types of acoustic transients. 
For discussion purposes for the remainder of this thesis these 
transients will be referred to as type I, type II, and type 
III transients. These transients were recorded at sea in the 
presence of the type of background noise described in section 
I. In addition to the raw times series data another set of 
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signal data were produced by signal processing to extract 
relevant information features contained within an individual 
record or transient. Unclassified examples would be such 
things as frequency content, amplitude, density of the power 
spectrum etc. When necessary these features will be referred 
to as feature a,b,c, etc. All data were obtained from the 
Naval Surface Warfare Center (NSWC) and all data preprocessing 
was done there. These data were processed by NSWC to 
characterize each transient event in terms of 45 different 
features. Some of the features however provide redundant 
information so that the final processed data set used in this 
portion of the thesis utilized only 31 of the features. 

The acoustic transient identification question is a matter 
of pattern recognition. In other words, one could ask if there 
is structure in transient type I which is different than type 
II, and III. Additionally one may ask are there features in 
exemplar #1 of type I which are similar to the features in all 
other type I transients. If this is the case then a neural 
network may be able to recognize and more importantly recall 
patterns in this structure and thus distinguish between class 
types. Further one hopes that there are unique features within 
a data class which clearly distinguish it from other data 
classes . 

1. Euclidean Distance Analysis 

To address these questions, related to classif ication , 
a substantial effort was made to characterize the data. With 
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data of this type (i.e. feature extracted) characterizing the 
input data by class is not a trivial question. One technique 
which was utilized in this research was to simply treat the 
input data as vectors arranged on a 31 dimensional 
hypersphere. This approach then allows the calculation of 
euclidian distance (D) on the hypersphere from the tip of one 
vector, say exemplar 1, to the tip of all other vectors in the 
space . 



) 2 ( 12 ) 

i 3 

The following four figures, Figures 5 through 8, 
illustrate euclidean distance for vectors in the data set. 

The first figure, Figure 5, represents the euclidean 
distance from a type I vector plotted against 150 vectors 
chosen at random and representing all data classes. The 
remaining three figures, Figures 6 through 8, represent one 
vector from each data type graphed as euclidean distance from 
the remaining vectors of its type in the data. Inspection of 
the graphs reveals considerable variability, especially in 
Figure 5, which represents all data types, indicating there 
are a number of different data classes within the entire data 
set. However, a closer look at Figures 6 through 8 show that 
the data can in fact be categorized into distinct classes. For 
example for the type I data of Figure 6 there exist 5 distinct 
groupings. The first grouping contains those 4 vectors with a 
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total distance less than 0.2 x 10 4 , the next grouping occurs 
between 0.4 x 10 4 and 0.6 x 10 4 , the largest group is a set of 
data centered near 0.95 x 10 4 , a fourth group consists of 
those points with distances between 1.1 x 10 4 and 1.5 x 10 4 , 
and finally the last group consists of those 6 vectors 
represented as the large spikes with distances exceeding 1.8 
x 10 4 . 



xlO 4 Euclideon Distonce Plots 
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Figure 5: Euclidean Distance for all Data Types 

This delineation is important because it points to the 
fact that the data can be characterized by a set of common 
features. Although only one vector has been chosen to 
illustrate the euclidean distance analysis, these vectors are 
representative of the data set and euclidean distance plots 
for other vectors in the data set provide the same analytical 
results . 
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Figure 6: Euclidean Distance for Type I Data 
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Figure 7: Euclidean Distance for Type II Data 



25 



xlO* Eucliaeon Distance Plots 




Figure 8: Euclidean Distance for Type III Data 

Euclidean distance will be an important characteristic to 
consider when making up the final training and test data sets, 
as it is particularly important that all data subgroups within 
a given data type be represented in the training data set if 
the network is to perform recognition tasks on all of the test 
set satisfactorily. 

B. NEURAL NETWORK CONSTRUCTION 

1. Network Type and Size Considerations 

The next step in the classification task was to settle 
on a network type. This is an important neural network 
question and will certainly differ from task to task. When 
answering this type of question there simply is no substitute 
for domain knowledge. Knowledge of the nature of acoustics and 
acoustic transients are the keys to making the correct choice. 
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This thesis utilized a heteroassociative backpropagat ion 
single hidden layer network to perform the classification 
task. This type of network is particularly suited to pattern 
recognition . [Ref . 5], 

The next question which must be addressed is the size 
of the network which is best suited to perform the task. For 
this portion of the analysis the size of the input layer to 
the neural network is fixed by the number of individual 
parameters which are used to characterize each exemplar in the 
data set. The original data contained 45 individual 
parameters or features, 14 of which were redundant or were 
used for data tags rather than to convey signal information, 
thus the final data set contained 31 individual parameters 
characterizing the data into one of three types. This fixed 
the input data layer size at 31 neurons. 

Next one must decide on the number of hidden layers 
and neurons which will enhance efficient and reliable network 
performance. Few theoretical studies are available to guide 
neural network practitioners in answering this important 
question. Neural Ware, Inc., a professional Neural Network 
Engineering Corporation does provide some guidance [Ref. 5]. 
Neural Ware suggests that the number of hidden layer neurons 
is proportional to the ratio of the number of exemplars in the 
data set to the sum of the nodes in the input and output 
layers : 
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H - d 
f (m+n) 



( 13 ) 



Where 

d = # of exemplars in the data set 
f = Arbitrary number between five and ten 
m = # of neurons in the output layer 
n = # of neurons in the input layer 

For the work cited here this number computed to three 
neurons in the hidden layer. A three hidden neuron network was 
built and tested but performed poorly. This guidance may be 
useable for very large data sets but proved to be of little 
use in the construction of a hidden layer for the work 
considered here. 

a. Singular Value Decomposition 

Recall from section II that a neural network 
learns by adjusting connection weights between neurons. These 
weights are stored in a weight matrix and updated during the 
training process. This weight matrix is nothing more than an 
array of numbers and like any other numerical array is 
characterized by certain properties. One such property of 
importance when investigating the hidden layer size is the 
number of singular values in the weight matrix. The number of 
singular values in the weight matrix determines the number of 
linearly independent eigenvectors necessary to fully span the 
vector space. This number in turn provides a basis for the 
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number of independent features in the data and thus provides 
a good starting point for determining the number of neurons 
necessary in the hidden layer for network convergence. 

The data considered here was analyzed and 
decomposed to singular values utilizing MATLAB, a commercially 
available signal processing tool. MATLAB code was written to 
capitalize on the resident singular value decomposition 
feature . 

Figure 9 below represents the singular value 
decomposition of the data set. 



Singular Value Plot 




Figure 9: Singular Value Decomposition 



Scrutiny of Figure 9 shows that the data contains 
approximately 21 singular values. This then forms a basis for 
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determining the number of individual independent elements in 
the data set that a hidden layer might be expected to extract. 
Note that the curve in Figure 9 continues to rise slowly even 
after 6000 iterations, indicating the presence of perhaps a 
few more singular values. The number of singular values 
extracted by the MATLAB software of course depends on an 
operator selectable threshold. Had a smaller threshold been 
used the number of values extracted would have been slightly 
higher . 

Networks containing 21 neurons in a single hidden 
layer and networks which distributed the 21 neurons between 
two hidden layers were built and tested. Results are reported 
below. 

Theoretical discussions of this subject suggest 
experimenting until satisfactory performance is achieved. 
Using the singular value decomposition above as a guide, 
experimentation was conducted which attempted to find the best 
number of hidden layer neurons. 

This experimentation led to a final network size 
of 31 input neurons, 25 hidden neurons in a single layer, and 
3 output neurons. This network was built, tested , and found to 
be efficient and reliable. Results of the performance of this 
network are discussed in the results portion of this section. 

C. TRAINING THE NEURAL NETWORK CLASSIFIER 

Often an important consideration in neural network 
training and performance is the content of the training file 
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relative to the test file and the length of training time 
required to ensure satisfactory network performance. These 
issues will now be addressed. 

The fundamental performance test that a neural network 
must pass is an ability to learn and then recall the entire 
data set. This is important because failure of the network to 
be able to do this may point to inconsistent or mislabeled 
data, the wrong type of network for the task, or simply a 
problem which is not suitable for a neural network to solve. 
The network described above satisfactorily learned and 
recalled the 458 exemplar data set to 100% accuracy. This 
being achieved it was necessary to break the data set up into 
training and test sets. 

The first data split consisted of placing the first half 
of the 458 exemplars in a training file and the second half of 
the 458 files in a test file. Performance for the network 
trained on the first 229 exemplars and tested on the last 229 
exemplars was satisfactory but not optimum. Results of this 
testing is discussed below and compared to other networks in 
Table 2. 

The next step in training and test set construction was to 
split the data in half by random selection, hoping that enough 
exemplars of all data classes within a type would exist in 
both sets to allow for satisfactory performance. This 
delineation did in fact result in better performance. The 
network still however was unable to recognize a small 
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percentage of all data types. Further, these results led to 
questions concerning characterization of the data set within 
exemplar types. This question was for the most part resolved 
by the use of Euclidean distance as a class indicator. Having 
determined, through this analysis, that many different data 
classes existed within a given data type, the question still 
remained as to whether enough unique features existed to allow 
a neural network to separate data by type during training and 
recognition . 

Individual misclassif ications were then examined and a few 
more exemplars of odd or infrequent data classes were moved 
from the training set and placed into the test set, and the 
network was again tested. This network performed quite well, 
and its performance along with a comparison of results 
obtained from the other networks mentioned above are discussed 
in the results portion of this section. 

Finally the last consideration relative to network 
training was to find the training time, which resulted in 
optimum network performance, characterized by the fewest 
number of misclassif ications in the shortest possible training 
time. A procedure similar to that followed by Hecht-Nielsen 
was utilized to address this performance issue [Ref. 3]. 
Figure 10 below shows network performance graphed as the 
number of misclassif ications versus the number of training 
cycles . 
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Figure 10: Optimum Network Training Time 

In training a neural network, the network is first trained 
on and then subsequently tested on the training set. This 
demonstrates that the network is suitable for the task at 
hand. During this type of training the recognition error 
should continue to decrease indefinitely. However when 
training on the training set and then testing on the test data 
one finds that the error will eventually reach a minimum , and 
then begin to increase again as the network simply begins 
"memorizing" the input data set. It is this minimum in the 
test set curve which represents the point of optimum training 
time. As seen from the Figure 10 this occurred for this 
network at approximately 220,000 cycles of training. 

D. RESULTS: TESTING THE FEATURE BASED NETWORK 

A number of different networks have been described in this 
section. Comparative results for four of these networks is now 
presented in tabular form. These networks are: 
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1) A 31x25x3 network which was trained on 50/50 data 
split with the data being selected sequentially. 

2) A 31x25x3 network which was trained on the data split 
50/50 again but this time the data split was made by 
random selection. 

3) A 31x21x3 network which was trained on the final data 
split. This data split consisted of a 50/50 training/ test 
split in data, with the data being selected at random. 
After the random data selection, Euclidean class analysis 
was done on both sets and some additional exemplars were 
moved from the test to the training set to ensure all 
classes of data were included in the training data set. 

4) A 31x25x3 network trained on the final data set, i.e. 
the same data set used in network #3. 

Before presentation of results it should be noted that 
each network was trained to the same standard. This was done 
by training Network 4 to the optimum point as discussed in the 
network training section above, and noting the rms error for 
the output neurons. Networks 1 and 2, being the same size, 
were then trained to the same number of cycles. Network 3 
being smaller in size was trained to the same rms error. 

The final data set for the best network (network # 4) 
consisted of the data breakdown shown in Table 1, and the 
testing results for these feature based networks are 
summarized in Table 2 below. 
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TABLE 1: DATA BREAKDOWN BY TYPE IN FINAL DATA SET 



# of Exemplars by 


Training Set 


Test Set 


Data Type 






Data Type I 


115 


86 


Data Type II 


54 


33 


Data Type III 


90 


80 



TABLE 2: RESULTS FOR FOUR FEATURE BASED NEURAL NETWORKS 



Recognition 


Type I Data 


Type II Data 


Type III Data 


percentages 


( #correct / 86 ) 


(#correct/33 ) 


(^correct / 80 ) 

i 


Network # 1 






1 


(31x25x3) 


0.26 


0 . 55 


0 . 53 


Seq. Data 


(22/86) 


( 18/33) 


(42/80) 

1 


Network # 2 








(31x25x3) 


0.89 


0 . 87 


0 . 92 


Random Data 


(76/86) 


(29/33) 


(74/80) j 










Network # 3 








(31x21x3) 


0.71 


0.71 


0 . 96 


Final Data 


(61/86) 


(23/33) 


(77/8C) * 


Network # 4 








(31x25x3) 


0 . 92 


0 .94 


0 .95 


Final Data 


(79/86) 


(31/33) 


(76/80) 
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Some analysis of these results is now in order. Comparison 
of rows three and four shows improvement when training on the 
same data set with a network which contains 25 vice 21 neurons 
in the hidden layer. This is evident by comparing the improved 
recognition percentages in row four (.92 for type I data) over 
those in row three (.71 for type I data). This suggests that 
there are more than 21 independent features in the data which 
the network is using to fully characterize and classify the 
data . 

Recall that singular value analysis indicated that the 
number of units in the hidden layer should be of the order of 
21. Good performance was obtained with a network of 25 hidden 
units . 

Next compare rows one and two of Table 2 . Here we see 
quantitatively the importance of random data selection in data 
enhancement. Compare the improved recognition percentages in 
row two (0.89 for type I data), where data was selected 
randomly to form training and test sets, to that in row one 
(0.26 for type I data), where data was formed by splitting the 
whole data set in half sequentially. Random selection clearly 
improves the likelihood of including all data classes within 
a data type. 

Last consider rows two and four. The 3% improvement shown 
in the recognition percentages of network four (0.92 for type 
I data) over network two (0.89 for type I data) is a direct 
result of the euclidean distance analysis on data class 
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structure. This improvement was realized by using euclidean 
distance to ensure that exemplars of all data sub classes 
within a type were included in the training set. 

The implications of the success of network four and a 
comparison with other networks considered in this thesis are 
discussed at length in the final section of this thesis. 
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IV. TIME AND FREQUENCY DOMAIN NEURAL NETWORK CLASSIFIERS 



Having considered the detection of short duration acoustic 
transients by neural computing methods in "feature space" it 
is instructive for comparative purposes to consider detection 
of these transients in the time and frequency domains. 

A. TIME DOMAIN NEURAL NETWORK CLASSIFIER 

Recall that the original data for this thesis was obtained 
by recording the analog voltages in a continuous time series 
from a waterborne buoy. This data was then sampled at a fixed 
sampling rate (i.e. digitized). The acoustic transients were 
then electronically "snipped" from the digital recording and 
processed to parameterize them into 31 distinct features. This 
section of the thesis considers the detection and 
classification of the original digitized time series data. 

1. Time Domain Data Analysis 

Each snipped times series contains within it the 
acoustic transient of interest. See Figure 11 on the following 
page. Figure 11 is a typical type I transient time series 
record. It consists of 3000 points of raw data representing 
one acoustic transient and the background noise which 
surrounds it. As is clearly evident from Figure 11 most of the 
information content in the record consists of mere background 
noise. It is neither necessary nor desirable to present the 
majority of this background noise to a neural network. 
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Figure 11: Type I Transient; Full Time Series 



One significant disadvantage of doing so is that 
background noise is common to all transient types and thus 
provides no new information to the network by which it can 
make discrimination in the classification process. 
Additionally the length of the record determines the number of 
input neurons to the neural network. Network size and more 
importantly training time is significantly reduced by removal 
of this noise. 
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Figure 12 below is the same type I transient (The 2nd 
peak in Figure 11) . In Figure 12 just the 150 points on either 
side of the transient peak has been retained. 



Typ« I Trcn*i«nt — Di*cr#t« Tim# Strips 




Figure 12: Type I Transient; Discrete Time Series 



This representation of the data retains the essential 
information relevant to classification of the transient but is 
much reduced in size, and thus will allow a neural network 
classifier which can be trained in fractions of the time to 
train on the full record. Figures 13, and 14 , on the following 
page present type II, and type III transients for comparative 
purposes. Close inspection of Figures 13 and 14 when compared 
to Figure 12 yields subtle but important differences in the 
structure of the signals. 
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Figure 13: Type II Transient; Discrete Time Series 



Type III Trons ent 




Figure 14:Type III Transient; Discrete Time Series 
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These differences are more marked in the frequency 
domain and will be discussed in detail later. However note 
that the type I transient shows a distinct and sharp rise 
followed by a steady decay, which is characteristic of an 
exponentially damped decay. Compare this to the type II and 
type III transient which show more gradual rises. These latter 
type of transients seem to more slowly build to peak values 
and then slowly decay as opposed to a sharp burst of energy 
which then decays characteristic of the type I transient. It 
is features such as these that the neural network will use to 
distinguish between the types of transients. 

2. Training and Test Set Data Construction 
a. Training The Network 

Next it is relevant to consider the distribution 
of the training and test data sets. A detailed discussion of 
how data can in general be split was covered in section III. 
In section III recall that the final data set was split into 
a training set consisting of 259 exemplars and a test set of 
199 exemplars. NSWC graciously provided at the authors request 
all 458 of the feature based exemplars and 60 exemplars of 
times series data. The 60 time series data exemplars (Figure 
11 represents one such exemplar) represent the time series 
from which 60 of the 458 exemplars of feature based data were 
extracted. Thus as performance comparison of neural networks 
in feature space, the time domain, and the frequency domain 
was a stated goal of this thesis, training and test data sets 



42 



in the time and frequency domains were split to ensure that 
their feature based counterpart remained in the same data set, 
either training or test. That is if a data vector was in the 
feature based training set and it was one of the 458 vectors 
for which time series data existed then its time series data 
also went in the time series training set, and likewise for 
data in the test set. As the training data set in feature 
space was larger than the test data set this led to a somewhat 
disproportionately large training data set in the time domain 
as well. One vector had to be eliminated from the time series 
data set leaving the remaining 59 vectors in the time domain 
to be distributed as follows: 



TABLE 3: TIME SERIES DATA BREAKDOWN 



# of Exemplars 


Training Set 


Test Set 


Type I Exemplars 


24 


15 


Type II Exemplars 


5 


' 1 
4 


Type III Exemplars 


7 


4 



b. Results: Testing the Time Domain Network 

Several networks were built and tested on the time 
domain data. All performed poorly. The network showing the 
highest success was a backpropagation multi layer network with 
300 neurons in the input layer, 150 in the first hidden layer, 
20 neurons in a second hidden layer and finally 3 output 
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neurons. This network was only able to correctly classify 60% 
of type I transients, 45% of type II transients and none of 
the type III transients. Although disappointing in performance 
this network did lead to some understanding of the factors 
which may make detection and classification tasks difficult 
for a neural network. Others studying this problem, i.e. 
transient pattern recognition in the time domain using real 
world data, have had trouble with consistently good 
recognition [Ref. 1]. The reasons for some of these 
difficulties will now be discussed. 

3. THE ARTIFICIAL TIME DOMAIN NETWORK 

In investigating the difficulties associated with this 
classification task, one has to first answer the question: "Is 
this task suitable for neural networks?". In the present case 
this translates to:" Can a neural network learn acoustic 
transient patterns in the time domain?". 

In contrast to the problems mentioned above some 
researchers have studied this problem and produced excellent 
results [Ref. 6] [Ref . 7]. To help answer the question in the 
preceding paragraph and to sort out why one task is achievable 
while the another is often not, artificial acoustic transients 
were built to serve as test and training vectors which could 
be easily manipulated for investigative purposes, 
a . Construction of the Data Set 

Figure 15 below shows an artificial transient 
generated for use in the following investigation. Figure 15 is 



44 



labeled as a type I transient. It was built with the original 
actual type I transient serving as a model, and comparison 
between the two shows some similarity. Comparison with Figure 
12 reveals that both transients are preceded by background 
noise, and then jump suddenly to a peak value and then decay 
exponentially. Both show randomness but also some periodicity. 
Figures 16 and 17 below are exemplars of the artificial type 
II and type III transients. These also show some similarity to 
their real counterparts, as they were built with a build and 
decay vice burst and decay structure in mind, and are clearly 
distinct from one another. 
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Figure 15: Type I Transient; Artificial Data 
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Figure 16: Type II Transient; Artificial Data 
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Figure 17: Type III Transient; Artificial Data 



46 



